Crate serde_arrow

source ·
Expand description

serde_arrow - convert sequences Rust objects to / from arrow arrays

The arrow in-memory format is a powerful way to work with data frame like structures. However, the API of the underlying Rust crates can be at times cumbersome to use due to the statically typed nature of Rust.

serde_arrow, offers a simple way to convert Rust objects into Arrow arrays and back. serde_arrow relies on the Serde package to interpret Rust objects. Therefore, adding support for serde_arrow to custom types is as easy as using Serde’s derive macros.

In the Rust ecosystem there are two competing implemenetations of the arrow in-memory format. serde_arrow supports both arrow and arrow2 for schema tracing and serialization from Rust structs to arrays. Deserialization from arrays to Rust structs is currently only implemented for arrow2.

Overview

The functions come in pairs: some work on single arrays, i.e., the series of a data frames, some work on multiples arrays, i.e., data frames themselves.

Functions working on multiple arrays expect sequences of records in Rust, e.g., a vector of structs. Functions working on single arrays expect vectors of arrays elements.

Example

Requires one of arrow2 feature (see below).

use serde_arrow::{
    schema::TracingOptions,
    arrow2::{serialize_into_fields, serialize_into_arrays}
};

#[derive(Serialize)]
struct Example {
    a: f32,
    b: i32,
}

let records = vec![
    Example { a: 1.0, b: 1 },
    Example { a: 2.0, b: 2 },
    Example { a: 3.0, b: 3 },
];

// Auto-detect the arrow types. Result may need to be overwritten and
// customized, see serde_arrow::schema::Strategy for details.
let fields = serialize_into_fields(&records, TracingOptions::default())?;
let arrays = serialize_into_arrays(&fields, &records)?;

The generated arrays can then be written to disk, e.g., as parquet, and loaded in another system.

use arrow2::{chunk::Chunk, datatypes::Schema};

// see https://jorgecarleitao.github.io/arrow2/io/parquet_write.html
write_chunk(
    "example.pq",
    Schema::from(fields),
    Chunk::new(arrays),
)?;

See also:

Features:

Which version of arrow or arrow2 is used can be selected via features. Per default no arrow implementation is used. In that case only the base features of serde_arrow are availble.

The arrow-* and arrow2-* feature groupss are comptaible with each other. I.e., it is possible to use arrow and arrow2 together. Within each group the highest version is selected, if multiple features are activated. E.g, when selecting arrow2-0-16 and arrow2-0-17, arrow2=0.17 will be used.

Available features:

FeatureArrow Version
arrow-39arrow=39
arrow-38arrow=38
arrow-37arrow=37
arrow-36arrow=36
arrow-35arrow=35
arrow2-0-17arrow2=0.17
arrow2-0-16arrow2=0.16

Modules

  • Internal. Do not use
  • Support for the arrow crate (requires one the arrow-* features)
  • Support for the arrow2 crate (requires one the arrow2-* features)
  • The basic machinery powering serde_arrow
  • Experimental functionality that is not bound by semver compatibility
  • Helpers to configure how Arrow and Rust types are translated into one another

Enums

  • Common errors during serde_arrow’s usage

Type Definitions

  • A Result type that defaults to serde_arrow’s Error type